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(54) Title: DOMAIN BASED CONGESTION MANAGEMENT 

(57) Abstract: The Domain-based Congestion Management method and apparatus detects and regulates congestion in a Diff-serv 
netwoilc. It uses an improvRED method for congestion detection at the core routers and token bucket filters for traffic regulation at the 
ingress nodes. In addition, impiovRED also provides feedback control. ImprovRED uses three thresholds for detecting congestion: 
a minth, a maxth and a FeedbackThreshold, which takes a value between the minth and the maxth thresholds. Whenever the average 
queue size is greater than minth and less than Feedback-Threshold, all outgoing packets are marked appropriately to indicate a 
potential onset of a congestion period. When the average queue size is greater than FeedbackThreshold (but less than maxth) packets 
are dropped probabilistically and all the outgoing packets are marked appropriately to denote the dropping phase. When the average 
queue size is greater than the maximum threshold, aU incoming packets are dropped. Feedback, in the form of a Local Congestion 
Notification (LCN) message, is used to notify the ingress nodes of a likely onset of congestion. Ingress nodes inunediately respond 
to the congestion notification by appropriately regulating their respective traffic rates (i.e., the amount of packets they inject into the 
Diff-serv network). The amount of traffic (or data packets) injected into the core of the Diff-serv domain is controlled by a token 
bucket filter at each of the ingress nodes. 
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DOMAIN BASED CONGESTION MANAGEMENT 



FELD OF INVENTION 

This invendon is related to the field of congestion management schemes foFControlling 
the flow of packets over the Internet. 

BACKGROUND OF INVENTION 

Potential congestion periods can occur for a number of possible reasons such as i) 
burstiness that is inherent m nodes and generated due to statistical multiplexing at nodes 
along a given path; and ii) non-adaptive greedy applications that may often cause 
(potential) congestion periods leading to severe packet loss conditions which affect other 
sessions that share network resources. Congestion avoidance and management schemes 
are essential for a better utilization of network resources. 

Generally, congestion control schemes have two phases viz. i) early congestion detection 
and avoidance; and u) a congestion management scheme that begins to operate once a 
congestion period occurs. Sevbral congestion management schemes have been proposed 
so far. For exairq>le, binary feedback-based congestion management schemes rely on end 
sources to react to the congestion messages. Similarly, the current Internet reUes on end- 
to-end congestion control mechanisms through either packet droppmg or expUcit 
congestion notification (ECN) by marking packets of a session. However, the end-to-end 
reaction to congestion is criticaUy dependent on round tiip time (RTT) between the end 
hosts. 

Explicit rate management algoritimis have also beSi- proposed m the context of ATM. 
However, the expUcit rate notification schemes that indicate the rates to which the 
individual traffic sources have to adapt are too complex to be unplemented and require 
extensive processing at the core switches (or routers). They also need to generate 
periodic resource management (RM) cells that cause extra traffic, Rirtheimore, these 
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schemes, in particular, require per-flow state maintenance that cannot be tailored easily to 
suit the heterogeneous Internet 

Core state-less fair queuing (CSFQ) has been proposed in order to eliminate the book 
5 keeping of each flow state at core routers. However, the key focus of CSFQ is on 
achieving fair bandwidth allocation. It relies on end hosts (traffic sources) to 
detect available bandwidth or congestion at the bottleneck nodes. The long round trip 
time between a given pair of source and destination nodes can lead to late reaction by the 
sources to the congestion notification. As a result, CSFQ may not be adequate in 
1 0 reducing packet loss. 

Differentiated services (Diff-serv) over the Internet Protocol (IP) have been proposed to 
avoid maintaining state information of large number of flows at the core routers. In 
addition, Diff-serv moves the complexity of per-flow bandwidth management to 
15 intelligent edge routers. A Diff-serv cloud comprises i) a set of edge nodes known as 
ingress or egress nodes depending on the traffic flow direction that may maintain per- 
flow state and ii) a set of core nodes that do not maintain per-flow state information and 
carry a large number of aggregated flows (see Fig.l). 

Overview of Diff-serv architecture 

20 

The crux of Differentiated Services (DS) is. that packets get diffearent levels of service 
based on Type of Service (TOS) bits. These include i) traffic policing that leads to 
marking of the packets that are out of profile (violation of some traffic parameter as 
specified, e.g., peak-rate); ii) packet dropping and buffering strategies at various routers, 
25 also known as Per-Hop-Behaviors (PHBs); and iii) choice of an appropriate queue that 
maps to the type of service that was chosen by the application as mdicated by the TOS 
bits. The flow or flow-aggregate information is maintained only at a few selected routers, 
such as edge routers. Thus, per-low/aggregate monitoring is avoided at core routers. 

30 The PHBs that run on core routers can be adaptively tuned to compensate for the loose 
admission control at the edges where traffic of various classes are injected in to the 
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network with a goal of predictable QoS. However, best-effort service still constitutes a 
considerable amount of net traffic. The allocadon of the bandwidth available for best 
effort depends on the policy of iadividual Internet Service Providers (ISPs) and the 
service level agreements with other neigjiboring DS domains. 

Currently there are two classes of services defined in the context of Diff-serv viz.: i) the 
Assured service (AS) and ii) Premium service (PS). They are respectively mapped onto 
Assured forwarding (AF) and Expedited forwarding (EF) per-hop-behavio« (PHBs). 
The AF PHB forwards packets according to their priorities. Thus, in the event of 
congestion, high priority packets receivebetter service than low priority packets. TheEF 

PHB aims at reducing the queuing delays and emulates a leased line service from the end 

user's perspective. 

Nevertheless, congestion management schemes areessential for good networkutiUzation, 
with priority-based packet handling schemes. Potential congestion periods can anse 
and it is difficult to assess the available bandwidth unless the core routers are enhanced 
with robust resource management schemes . Thus, each of the ingress nodes (unaware of 
an onset congestion period) can potentially inject more traffic into ^ core network of a 
Diff-serv domain. ECN has been proposed, however, the ECN requires end-hosts to 
interact and respond to the congestion notification. 
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Red 

Active queue management algorithms, such as Random Early Detection (RED), can be 
employed in order to detect and avoid any potential network collapse due to congesUon. 
congestion detection can be based on buffer monitoringbysettingati^eshold value for 

buffer occupancy. However, simple buffer occupancy-based techniques may not be 
sufficient to handle bursty traffic because bursty traffic may temporarily lead to a buffer 
occupancy greater than tiie tiireshold value. This leads to frequent congestion 
avoidance/management triggering mechanisms. In contrast to simplebuffer momtonng, 
tixe RED algoritimi calculates an average queue size by using a low-pass filter witii an 
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exponential weighted moving average (EWMA). With a constant wq (0 < wq < 1), with 
the arrival of nth packet, the average queue size is given as follows 

avgQsize„ = (1 - Wq).avgQsize„ . i + Wq.currentQsize„ a) . 



The allowed range of the average queue size before packets are dropjped determines the 
allowed burst sizes. Thus RED can accommodate traffic bursts unlike drop-tail FIFO- 
based queue thresholds, as the average queue size does not solely depend on the current 
queue size. 

RED employs two queue thresholds, i.e., minth and maxth. Whenever the average queue 
is between the minth threshold value and the maxth threshold, the RED algorithm drops 
(or marks) packets randomly with certain probability Pdcop indicating an incipient 
congestion. If the average queue size exceeds the maxth, it drops all the packets until the 
average queue size falls below the maxth threshold. The probability of dropping is a 
function of average queue size and is given by 

P^^ = P — favgOsize - mmth) 

(maxth - minth) (2) 

where Pmax is the maximum probability of a packet drop. It is shown that the average 
queue size is substantially decreased with random packet dropping. This mitigates the 
tail-dropping effects and the associated synchronization of various TCP (^pUcation) 
back-offs (reduction in traffic transmission rate). 
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SUMMARY OF THE INVENTION 

The present invention is a method and apparatus that uses thresholds for regulating 
congestion. It deterministically marks outgoing packets by setting a U:N bit(s) when an 
average queue size of a token bucket filter is between a mimmum threshold_and a 
feedback threshold. In addition, it probabilistically drops incoming packets and marks all 
outgoing packets when the average queue size is between a feedback threshold and a 
maximum threshold. Finally, aU incoming packets are dropped when tiie average queue 
size equals or exceeds said maximum threshold. 

In another preferred embodiment, tiie present invention is an apparatus and method for 
regulating traffic flow in a differentiated services network between nodes. First a core 
node detects congestion. Next, and egress node sends a congestion feedback notification 
message to at least one ingress node. In response, the ingress node reduces its traffic rate 
in proportion to the amount of traffic tiiat it was mjecting mto tiie network when 
coagestion was detected. 

In still anotiier preferred embodiment, the present invention comprises a metiiod and 
apparatus for regulating the traffic rate at an ingress node by varying the number of 
tokens consumed by a data packet and transmitting tiie data packet if flie number of 
tokens consumed by tiie packet is less tiian tiie available tokens. 

In still anotiier preferred embodiment, tiie present invention is an apparatus for 
controlling traffic flow in a differentiated services domain. It is comprised of apluraUty 
of tiiree types of nodes, ingress, egress and core nodes. Furtiiennore, each ingress node 
has a corresponding token bucket filter which is used to regulate the flow of data packets. 
The token bucket filter is comprised of a token generator and a bucket to hold at least one 
generated token. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figure I contains the architecture of a Diff-serv domain. 

5 Figure 2 illustrates the improvRED method. 

Figure 3 illustrates a simple 2-bit scheme used to indicate the onset of a congestion 
period at the core nodes. 

1 0 Figure 4 is the definition of DSCP byte. 

Figure 5(a) illustrates a token bucket filter connected to a Diff-serv domain. 

Figure 5(b) illustrates the components of a token bucket filter. 

15 

Figure 6(a) is a flowchart for the Token Bucket Filter-based rate control method. 

Figure 6(b) illustrates a Token Bucket Filter-based congestion management method used 
with ingress nodes. 

20 

Figure 7 illustrates the varying of packet weight with demand and LCN messages. 

Figure 8 depicts a possible discrete state implementation of the algorithm in figure 7. 

25 Figure 9 depicts a simulation setup. 

Figure 10 shows the performance of the DCM method vs. non-feedback based congestion 
control. 

30 Hgure 1 1 illustrates the delay performance of the DCM method. 
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Figures 12(a) and 12(b) depicts a sample average queue size and the distribution of 
packet weight at an ingiess Token Bucket Filter for a utilization factor of 0.8. 

Figures 13(a) and 13(b) Ulustrate the distribution of congestion periods for a non-DCM 
5 method at the core node. 

Figures 14(a) and 14(b) iUustrate tiie drop phase duration for the DCM method for the 
utiiUzation factors 0.8 and 0.9 respectively. 

1 0 Figure 15 iUustrates tiie performance of tiie DCM method with domain-RTT variation. 
DETAILED DESCRIFTION OF THE INVENTION 

The present invention is a feedback-based congestion control for a Diff-serv domain 
15 caUedDomain-basedCongestionManagement(DCM). One improvement it has over 

existing congestion control schemes is tiie advantage of shorter RTFs between a given 
pair of ingress/egress nodes ofaDiff-servdomam. This is incontrast to tiielongend-to- 

endRTTsof existing congestion control schemes tiiatinvariably result in largelatency of 
reaction to transient congestion periods, m addition, die present invention is not compbx 
20 «ndrequiresnoflowstatemaintenanceattiiecorerouters.Iliereforeatcanreactqmckly 
to transient congestionperiodstiiat occur locaUywitiiinaDiff-ervcloud. FurOiemiore. 

shorter RTTs between a given pair of ingress/egress nodes can lead to fester detection 
andbetter utilization of tiie transientavailablebandwidti^intiiecoreDiff-servdomam. 

25 ThepresentinventionimprovesupontheRandomEarlyDetection(RED)andexpUcit 
congestion notification mechanisms to handle best^ffort traffic. Tb. DCM scheme is 
based on an improvement to tiie RED scheme (improvRED) of low complexity rumiing 
on tiie core routers and an adaptive Token Bucket Filter (JBF) - based traffic regulation 
at ttie ingress nodes. It is a metiiod and apparatus that allows all tiie ingress nodes (D to 
30 shaxeavaUablebandwidtiiandadjusttiieirratesoftrafficinjectionsuchtiiatttu^averag^ 
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congestion periods are reduced inside the Diff-serv domain. This leads to an overall 
improvement in utilization. 

The DCM scheme is a distributed congestion control algorithm that runs on each of the 
ingress nodes. On the one hand, it helps the ingress nodes to avoid packet loss during 
congestion periods and, on the other hand, detects available bandwidth during 
congestion-free periods. In addition, the RED mechanism is improved to distinguish 
between an onset of a likely congestion period (marking phase) and a persistent 
congestion period that invariably incurs packet drops (dropping phase). 

Feedback, in the fonn of a Local Congestion Notification (LCN) message (or message), 
is used to notify the ingress (or input) nodes (I) of a likely onset of a congestion (free) 
period. (In a preferred embodiment, the Local Congestion Notification message is an 
LCN bit(s) set in a data packet). In addition, an associated feedback control loop is 
introduced into the DCM scheme. Upon detecting the LCN bit set by any of the 
congested core nodes (C), egress (or output) nodes (E) shall report to corresponding 
ingress nodes (I) about the onset of a congestion period. The ingress nodes, as a result, 
shall respond to LCN messages. The LCN messages are used to indicate the congestion 
state at the relevant core routers, based on the average queue size at their TBFs. 

The average queue size at the end of an adaptation interval (set to RTT) associated with a 
given pair of ingress/egress nodes (UB) is used as a measure of demand for network 
bandwidth at the underlying ingress nodes (I). However, traffic rates associated with 
each ingress node (I) shall be varied in proportion to the amount of traffic each ingress 
node (I) is injecting at the onset of congestion. During the congestion management 
period, die ingress node (J) that injected more traffic at die time of an onset congestion, 
shall be responsible for a greater reduction in transmission rates during congestion 
recovery period. This leads to a fair resource utilization among ingress nodes (I). 
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Th. nnmain-ba^'-H rnnfrP.stioii ManapftiTient Method 

T^eDCM scheme comprises three main features: use ofaimprovl^Dmeth^^ 
apparatus for congestiondetection at thecore nodes or routers(Q,congestionfe^^^^ 
ia the formofU:Nbits and use of tokeabucketffltersCrBF) to regulate traffic at ingr^^^ 

nodes (I). 
Im proved RED 

Random p-cbabUistic droppiog/maAfag of pack«s .Be* individual se^ions 
p^porfionaltotteirttaffloraBS. In existing conB«ttonconttol schemes, .gress nodes 
«nno.«ama«tedeg«otcongostfona.abo»leacckco«rou«r«h»nfcECNbi.aof 
U,e packets. Yet, iewouldbebenendaliftoegKSa nodes (E) are abte to dettCa 
" potential onset of congestion period so as to Binimiz. pacltet losses and to delay or 
prevent a cor^ponding congestion petiod. Ti^''^ the p»se« invention provdes an 
early feedback so as to tnintaize packet losses and to deUy or prevent . eo.r«^g 
congestionperiod. The DCM scheme introduces an improved RED. eaBedintprovEBD. 
that basically improves the original RED to provide feedback cont«>l. 

b addition, an addlttonal threshold, PeedbackTbreshold (or Peedb«*), is i«^ 
„ ti-epresentinventton. Ittakes avaIuebeb»eeathemin,h(orminim«n)and.hemax4 
(orma^um)ti«sholds. TheimprovRBD™Hb..MderadetenmnisttcaUynd.<..han 
probabilistically marking phase whenever a,e average queue size is greater than mmth 
andlessthanFeedbaek-nHeshold. During this phase. anoo,gotagp«=ke<s are marited 
appropriately to mdicatc a potential onset of a congestion period and invdves no pack« 
25 drops 130. 

When the average queue size is equal to or greater tinm FeedbackThn=shold. bo. less than 
mxth. packets are dropped probabilistically 140 and aU U>e outgoing pack«s am ma.t«i 
appropriatelytodenotetite dropping phase 150. Bott. the m^kmg and dropping phases 
30 an.conside.edash.lica.orsofpotentiaIcongestionstid.atco.enodes.Theseph.ses.re 
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experienced by the core nodes (Q to varying degrees as a result of the following 
condition (for an onset of a congestion period) 

S/R^(0>/?cong (3) 

where IR^ (t) denotes the incoming traffic rate at the congested node associated with an 
ingress node (I)j at time t and Rcong is the service rate (link c£^)acity) of the corresponding 
congested core node (Q. Condition 3 is a necessary condition to drive the core node to a 
potential congested state through queue build-ups. As a result of the above condition the 
underlying core node (C) shall be in either of the drop/mark phase for short durations 
until the DCM scheme regulates the traffic such that the congested node is brought back 
to a congestion-free state. (The state where the average queue siz^ is less than minth is 
referred to as a congestion free state). When the average queue size is greater than 
maxth, ail incoming packets are dropped 160. 

The improvement behind the introduction of a FeedbackThreshold is to avoid packet 
drops before any congestion control schemes ply in. The improvRED method is 
illustrated in the Figure 2. 

In order to indicate the onset of a congestion period at the core nodes (C), feedback in the 
form of a simple 2-bit scheme is proposed that is depicted in Figure 3. The bits (bit 1 , 
bit2) serve as a notification to the egress nodes (E) of a specific congestion state in the 
Diff-serv domain- (The bits are set by the core nodes (C)). The egress nodes (E) shall 
appropriately notify corresponding ingress nodes (I) of the potential congestion at the 
core nodes (C). 

The details of the integration of the 2 bit-scheme with the last two bits of the 
differentiated service code point (DSCP) byte follows. (It is assumed that an LCN bit is 
available that is reset at every ingress node (I) and core nodes (Q can set it whenever 
they are congested). 
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TjTN Message 



As discussed above, feedback, ia the form of a Ix^al Congestion Notification (LCN) bu 
orbits isusednotifytheingressnodes(I)ofaUkelyoaset of congestion period. The IP 

5 (Intemetprotocol)providcstheTypeofService(TOS)byteintheIPpacke^that^ 
used for expUcit classification and the type of treatment (priority) the packet should 
receive at the intermediate routers. The TOS byte has been redefined as differenuated 
services codepoint (DSCP) byte in the context of Diff-serv. The definition of DSCP byte 
is described in K. Nichols. S. Blake. F. Baker, and D. Black. Definition of the 

1 0 Differentiated Services Field (DS Field) in the Ipv4 and Ipv6 Headers (RFC 2474). work 
in progress. 1998, hereby incorporated by reference, and sununarized in Figure 4. The 
first leftmost 6 bits of the DSCP byte are intended to define various PHBs and their 
associated services. Bits 7 and 8 are used for explicit congestion notification (ECN). 

15 Wheneverarouterdetectscongestion.itsetstheECNbit(bit8)oftheDSCPbytesothat 
the receiver can alert the source of the network congestion at an mtermediate router. THe 
source node, in turn, may respond to the ECN bit by adaptively decreasing the traffic 
transmission rate. This mechanism is mcorporated into many transportation protocols 
such as TCP and contributes to a healthy network that can avoid congestionK^oUapse. 



20 
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Benefits of ECN include: i) avoidance of collapse of the networic, and ii) flexibihty of 
adapting to network conditions by the end applications. lUe Internet can provide an 
indication of an incipient congesUon when using an active queue management scheme 
such as RED. In a preferred embodiment, the response to the ECN bit set packet by the 
sender is essentially the same as the response to a dropped packet. i.e.. the sending node 
lowers its transmission rate. In addition. ECN can be mcrementally deployed in both 
end-systems and routers. 

When an ECN bit set packet is received by a router, the ECN bit is left unchanged and 
the packet is transmitted. With existing congestion control schemes, when severe 
congestion has occurred and the router's queue is full, then the router drops a packet 
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when a new packet arrives. However, such packet losses will become relatively 
inftequent under the improvRED congestion control mechanism because the explicit 
notification can also be implemented through maiking packets rather than dropping them. 
In an adequately provisioned network in such an ECN- Capable environment, packet 
losses wDl then occur primarily during transients or in the presence of non-cooperating 
sources. 

Bit 7 (Fig. 4) is used as the ECN-Capable transportation (ECT) layer bit in the present 
invention. In a preferred embodiment, it is targeted towards TCP. The ECT bit would be 
set by the data sender to indicate that the end-points of the transport protocol are ECN- 
capable. See K. K. Ramakrishnan, S. Hoyd, B. Davie, A Proposal to Incorporate ECN in 
MPLS, Internet draft: draft-ietf-mpls-ecnOO.txt, work in progress, 1999, hereby 
incorporated by reference. The ECN bit would be set by the router to indicate congestion 
to the end nodes. Routers that have a packet arriving at a full queue would drop the 
packet. 

The proposed usage of the Bit 7 as an ECT bit in routers comes from recognizing that a 
packet, instead of bemg dropped, can, in fact, be instrumental in decreasing the traffic 
injected into the network by the corresponding transport protocol (such as TCP). 
However, this can.be potentially misleading and dangerous, as misbehaving sources 
(transport protocols) can set this bit and maintain/increase the current traffic rate even 
though the ECN bit is set However, one can look at the protocol field m the IP packet 
and determine the nature of the transportation whether it is adaptive or not. 

A uniform numbering system can be evolved for such purpose. For example, one can 
allocate protocol numbers with less than, say 512, for the adaptive. ECN-c^able 
transmission protocols and above 512 for non-adaptive/ECN-capable protocols. This 
facilities having a meaningful protocol number allocation and avoids the consumption of 
extra bits in flie IP packet. 
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Thus, in a preferred embodiment, bit 7 (Fig. 4) can be used as a U)cal Congestion 
Notification (LCN) bit. Using bit 7 as the IXN bit does not preclude normal ECN 
operations, and in fact, the global ECN bit can be set at the egress nodes (or egress 
routers) (E) depending on the LCN bit and the individual characteristics such as^ 
5 adaptability of the incoming (ECN-aware) flows. 

The LCN bit assumes only local importance with respect to the Diff-serv domains. It is 
reset at the ingress node (or ingress router) (1) and set at any of the routers within the 
same Diff-serv domain that delects an mcipient congestion. The LCN bit is intended to 
10 conveycongestionlocaltotbeDiff-servdomaintoanegressnode(E). The egress node 
(E)canthenalerttheconespondingingressnode(Dofthepotentialcongestion. The 

ingress node (D, in turn, can take appropriate measures on notification of tiie local 
congestion. Thus, the 2-bit scheme described in above and shown in Fig. 3 can be 
mtegrated with currentiy unused bits of the DSCP byte. 

Inapreferredembodiment,tiieegressnodes(E)«pondetectingpacketswith(bit,bit2) 
marked as eitiver (1,0) o. (1,1) inform the corresponding ingress nodes (D the first time 
they detect such a marking. In addition, they report tiie first time of local domam 
congestion clearance whenever they see the marking of either (0.0) or (0.1) (Le.. local 
20 dom^un congestion clearance) after tiiey previously notified a congestion period to the 
corresponding ingress nodes (1). The packet marking can be efficienflyin^lemented at 
the output port witiiout going ti^ough die entire queue every time at the corenodesCor 

core routers) (C). The feedback-based local congestion control algoritimi tiiat utihzes tiie 
LCN scheme is described next 

25 

Fftftdhack Control 

The feedback metiiod of the present invention operates as foUows. The available 
bandwidth in tiie core Diff-serv network is subject to transient congestion periods. As a 
30 result.itisdifficulttoEetagoodestimateoftiieavaflablebandwidtii. Each mgress node 
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(I), unaware of the exact amount of available bandwidth and in order to achieve higher 
networic bandwidth utilization, injects its traffic (in the form of data packets) until a 
congestion feedback notification is received from a corresponding egress node (E). The 
egress node (E) notifies all the ingress nodes (I) that share the congested bottleneck link. 
The ingress nodes (I) immediately respond to the congestion notification by^appropriately 
regulating their respective traffic rates (i.e., the amount of packets they inject into the 
Diff-serv network). Thus, the ingress nodes (I) cut back their transmission rates upon 
local congestion notification. 

Once the congestion is cleared at the bottleneck, the egress node (E) informs all the 
ingress nodes (I) that were previously notified about the onset of a congestion period. 
However, each ingress node (I) shall increment die rates at random times. This avoids 
correlation/synchronization among the ingress nodes (I). When another transient 
congestion period occurs, the cycle repeats with another local congestion notification. 

In tiie present invention, the egress nodes (E) identify tiie ingress nodes (I) that are to be 
informed of the onset/occurrence of a local congestion witiiin the Diff-serv cloud. Route 
pinning enhances and ensures that consistent service provisioning is feasible, if at all. 
See R. Guerin, A. Orda, QoS-based Routing in networks with, inaccurate information: 
Theory and algorithms, IEEE/ACM Trans., On Networking, June, 1999, hereby 
incorporated by reference. Route pinning is feasible eitiier tiirough a source routing TP 
option or some form of label mapping onto fixed paths between a given pair of ingress 
and egress (edge) routers (or nodes) (I,E) of a Diff-serv domain. It determines a path 
betwem a given pair of ingress (I) and egress nodes (E). 

Label-based route pinning is one of the easiest ways for identifying the ingress pomts of 
the packets received at die egress node (E). A label mechanism for route pinnhig is a 
more attractive option because of its simplicity. Furdiermore, it can be preconfigured by 
a network administration. A label, denoted byy, represents a node-tuple < ingress, il m 
, egress >, where in denotes a node in flie corresponding route within the Diff-serv 
domain (equivalent to a virtual circuit). A label that is attached to the incoming packets 
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at the ingress node (1) ensures that the packet is switched/forwarded through a pre- 
selected route betv,een a given pair-wise edges of a Diff-serv domain. Many indWidu^ 

flows between a given pair of ingress (J) and egress (E) nodes can be mapped onto same 
label. Labels have only local significance with respect to a given Diff-serv domam. 

Traffic regulation at ingiess nodes depends on 1) the demand and 2) the congestion state 
of the core domain which is determmed by the ingress nodes (D from LCN messages 
from corresponding egress nodes (E) is described next 
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Tr.v.n Rnnket m^^' (TRF^-ha.^ Rate Control Method 

A token bucket-based rate control method is widely employed as a traffic 
poUcer/conditioner and was originally proposed in J. Turner, New Directions in 
Communications. IEEE Communications Mag., Oct 1986. hereby incorporated by 
reference A token bucket filter CTBF) consists of two components viz.. i) a token 
generator (TG) and a) a token bucket (TB) to hold the generated tokens, (see Figures 
5(a) and 5(b)). It can be characterized as (R\ BI>) where r3 denotes the token generation 
rate andBD^ denotes thebucket depth. Each incomingpacketataTBFconsumes(or 

deletes) tokens from the bucket (TB) if available when it is transmittal by the associated 
mgress noded). Generally, the number of tokens consumed is equal to the packet si2« 
(Pkt_size) measured appropriately in terms of bits. 

Inthepresentmvention,theamountoftraffic(ordatapackets)injectedintoehecore 

the Diff-serv domain is controlled by a token bucket (TB) at each of the ingress nodes (1). 
The TBF-based rate control method disclosed in the present invention is used to vary the 

number of tokens consmned by dataof unit size that is represented by PktW^ 200. 

Therefore the number of tokens consumedbyapacket of sizePkt_size is Pkt.siz«* 
PktW^. The TBF at anmgress node (D shall transmitapacket if the following modrfied 

condition is satisfied. This condition essentially regulates the outgoing traffic rate. 
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Pkt_size * PktWt^ <= available tokens in the bucket (4) 210 

In a prefened embodiment, the PktWt* depends on two factors, viz., i) demand for 
network bandwidth at an ingress node (I) (or demand), and ii) state of the core domain 
with respect to congestion experienced at any of the nodes along the route associated with 
the label j. The demand for network bandwidth at an ingress node (I) is indicated by the 
average queue size at the TBR The average queue size at a TBF is estimated according 
to equation 1, as in the case of RED. If the average queue size at a TBF is greater than a 
threshold (or demand threshold or threshold value) denoted by DemandThrestf , then it is 
inferred that the bandwidth demand (or demand) is high, else, the demand is said to be 
low. The TBF-based traffic regulation at ingress nodes responding to local congestion 
notifications is described in Hgures 6a and fib. 

During congestion-free periods, depending on the demand for bandwidth at a given 
ingress node (TBF), the PktWtf is varied. If the demand for bandwidth increases 
monotonically, the PktWt^ decreases monotonically 220. Thereby a packet of unit size 
consumes less tokens (= Pkt„size * PktWt^ ) than when PktWtf is ideally/originally I. If 
the average queue size is greater than the threshold value DemandThrestf , then the 
PktW^ is further allowed to decrease so that in the next round trip cycle, more packets are 
allowed to pass tiie TBF and enter the core Diff-serv domain 230. However, during 
either low demand or congestion-free RTT cycles, the PktWt^ approaches 1. 

Congestion Notification 

Upon the receipt of a congestion notification by the ingress nodes, the PktWt^ is increased 
accordingly to the current value of PktWt^ at the time of the notification 240. The lower 
the PktWt* at the time of congestion notification, the higher the PktWt^ will be for the next 
RTT cycles during congestion control 250. (Therefore, the value of PktWt^ is varied in 
proportion to its current value at the time of notification). Until a congestion clearance 
notification is received (it is assumed that feedback control messages are never lost), the 
PkWt^ is maintained at a value greater than or equal to unity. Ttius, a more aggressive 
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ingress node (1) that previously had a high demarid for network bandwidth shall drive the 
PktWt? to a small value during subsequent congestion-free periods. If the demand is 
persistent, then it is more Ukely that the PktWt^ is weU below the ideal/original value of 1. 
Hence at the time of congestion notification the nodes with smaller PktWt^ s shall 
5 decreasethetrafficratemorethanthosewithsxnallerdemand. The method.is graphically 

depicted in Figure 7. 

Figures depicts a possible discrete state implementation of the method disclosed in 
Figure 7 where PktWt^ has (2iV + 1) states, ^ / MUe middle state (state 0) 

10 corresponds to the state where PktWti equals i. Anystate«(n6 {l..N})totheleft 
(right) of the state 0 is the state where its PktWt^ is m the n levels lower (higher) than 
level 0 (PktWt* = l).i.e. its PktWti<(»l. The extreme end states have PktWt^ assuming 
a value of either minPktW^ or maxPktWtf respectively. The states to the left of the 
middle state (PktW^ = 1) have their PktWt* in the range [minPktWt\ 1) and, similarly, the 

15 statesontherighthaveaPktWt^assumingthevaluesbetween(l,maxPktWf]. The 
dotted Unes correspond to transitions in response to LCNs of the onset of congestion 
periods while the soUd lines denote the transitions in response to demand at mgress 
nodes (i) and the state of congestion inside the core domain. The mapping of states on 
the left side to the states on the right of middle state in response to LCNs is a uniform 

20 mapping(i.e.,amappingfrom[minPktWti, l)onto[maxPktWt^ l))(c.f.Figure6,at 

the congestion notification time). 
Confi guration 

25 Since each of traffic aggregates associated with a label j is treated with equal priority, the 
configurationoftheTBFparametersisbasedonmax-minfaimess. See D. Bertsekas and 
R. Gallager, DataNetworks, pp. 527-528. Prentice Hall, 1992. hereby incorporated by 
reference. It is based on the assumption that the Diff-serv topology is static over the 
operating period. Tlierefore, if the topology changes, the relevant parameters can be 

30 recomputed (reconfigured) easily. 
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Each of the TBFs at a given ingress node (I) and the associated label j that denotes a 
unique route to any egress node (E) has three basic parameters to be configured, viz., i)R\ 
ii) BD*, and iii) the range of PtoWt* (where denotes the token generation rate and BD' 
denotes the bucket depth or size). Let R'bn and B^bn denote the max-min fan: share of 

5 bandwidth and buffer size respectively at a bottleneck (bn) core node. The R^bn and B^bn 
are calculated according to max-min fair resource algorithm by considering bandwidth 
and buffer size as independent resources shared by traffic aggregates corresponding to 
unique labels Q)s. Without much loss of generality, the buffers at all core nodes (C) 
inside a core domain are assumed to be of same magnitude. This implies that rate 

10 bottleneck of traffic aggregate is the same as buffer bottleneck (i.e., the same bn). Thus, 
the TBF denoted by (R\ BD') has the following as initial configuration values. 

R^ = R\n (5) 
BI> = Bjbn (6) 

15 

Furthermore, the initial value of PktWt* is set to 1. 

The values R^ and Biy of a TBF do not change, but the PktWt^ that determines the rate of 
consumption of tokens from token bucket shall vary according to two factors: 1) demand 
20 for bandwidth and 2) congestion state inside the core of a Diff-serv domain. In order to 
determine the range of PktWt^ for each of the TBFs, denoted by [minPktWt^, maxPktWt^], 
the effect of changing PktWt on the outgoing traffic firom a TBF denoted by (/? \ BD 0 is 
examined. (The maximum transmission unit (MTU) along a route corresponding to label 
j is a given). 

25 

Ranee of Values fmin. max^ for PktWt \S 

In order to be able to transmit a packet when PktWt! assumes the value maxPbtWt?, the 
following should be satisfied 
30 maxPktWl? * MTU <= BD" (7) 
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The maxPktWt^ is determined by die foUowing equation: 



inaxPktWt? = SDi (8) 
MTU 



: since a 



10 



If the condition (Eq. 7) is violated, then Head-of-line (HOL) problem can occur < 
front packet of MTU size in the queue cannot find enough tokens for its transmission. 
The HOL can lead to blockage of all packets waiting at the TBF queue even diough the 
packetsareeligiblefortiansmissionirrespectiveoftokengenerationrateR^ The 

transmission rate decreases to RV(maxPktWtf ) when PktWtf assumes the maxPkwt^. 

In order to limit the queue and packet loss due to buffer overflow at an ingress TBF 
during congestion periods at the core, it is desirable to have a mimmum transmission rate 
(MTR) at each of the ingress nodes (with MTO less than )- The UTR' is the minimum 
15 databufferdrainratefromingressqueue. The MTR^ can be determined from the 

following equations: 

{Rjp«t.MTRj)*Tbu« = Bj (9) 
MTRj<Rj (10) 



20 



where rU ^ '''^ "'''^"^^ '"^^ 

andBi is the databuffer of the mgress queue. Thus. maxPktWt^ is finaUy calculated as 



25 



maxPktWl? = min{ B^; ^ } 

MTU MTR^ (11) 



Notethatthemaximumdelayattheingressqueueisgivenby j! 

MTRJ. 



On the other hand, minPktWt* is determined based on an optimistic condition that the 
ingress node 0). when active, has the wholebufferof the intermediatebotaeneck node 
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(bn) at its disposal. That is, the maximum amoimt of data that can be injected at one time 
into the core domain when the token bucket (TB) is full is bounded by the buffer size of 
the intermediate botfleneck node (bn). Thus the minPktWt^ that determine the maximum 
amount of data is given by: 

minPktWtj = BD/ = R! (12)^ 

Bbn Rbn 

Note that the second equality comes from the fact that a max-min fair share is deployed 
in the domain. In addition, it is assumed that the max-min fan: share of bandwidth at the 
buffer-bottleneck node (bbn) is the same as the bandwidth at the bottleneck node (bn). 
As the PktWt^ approaches the minPktWt^ the maximum transmission rate at the TBF 
approaches Rbn- 

The demandThrestf dictates how aggressively the available bandwidth is sought after. 
There is a downside to choosing either a small value for demandThrestf or a large value 
for demandThrestf . A small value can lead to frequent PktWt^ changes towards a value 
smaller than 1 . This is due to the fact that it is highly likely that the average queue size at 
the corresponding TBF exceeds the demandThrestf . 

On the other hand, if the demandThrestf is set to a large Value, then the transient 
available bandwidth is not detected in a timely manner. Moreover, waiting for a large 
queue buildup at the TBF can lead to larger bursts injected in to the core domain, thereby 
increasing the packet loss probability at core nodes (C). (However, this limited by the 
choice of minPktWtj). 

Based on the above observations, the demandThrestf is set to a value 

demandThrestf < = BDl (13) 
2 
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It Should be noted that, in the DCM method during congestion periods one can 
simultaneously reduce both burst size and the rate of packet transmission into the core 
domainby making PktW.^ greater than L On the other hand. theDCM method can take 

advantage of the bounded increase transmission rates during congestion-free periods. 

Tte following proposition demonstrates the effectiveness of the DCM method in 
recovering the Diff-serv domain from a congestion state. 

Proposition: In the Diff-serv domain with the Domain-based Control Management 
10 (DCM)method.noneofthecorenodeswmbeinacongestedstateindefimtely.rfthe 

feedback messages are never lost 

Proof: By contradiction. Assume that there exists at least one of the core nodes (C) in a 
congested state indefinitely. Also assume that the feedback messages are never lost Let 
,5 / denote the set of labels (equivalently. ingress nodes) that share the congested core node 
(C) under consideration with bandwidth Rco„, Note that by max-min fair share allocaUon 
(configuration)ofR^s.wehaveI^.R^ <= W Ut tl be the time that an in^s node 
(D starts becoming congested. In order for the node to be congested, the condition J,. , 
ORi(t)>R«^Vt >t,shouldhold. ORi(t) denotes the outgoing packet rate at mgress 
20 nodeTOFassociatedwithlabelj,whereORi(t) = R^/(PktWti(0). Fromtl.allthe 
packets shall be marked with the LCN bit at the congested node. Since feedback 
n^s^es are never lost, each of the mgress nodes (D wiU be notified by egress nodes (E). 
based on the labels.Os) attached to the packets, after tl . l^t t2 (> tl) be the time that the 
first corresponding ingress node(s) (1) receives the feedback message. Hxen the Ingres 
25 node (D witi. PktWt^ (t) < 1 (t2 <= 0 by the time it receives the feedback messages wxll 
adjust the PktWt? s such that PktWl^ it) > I in response to LCN feedback messages. 
MeanwhUe the mgress node (DwithPktWt^W> 1 (,<= 0 wiU increase PktWtUo the 
nexthigherlevelofPktWtfvalue. mrefore its PktWtHt) is still greater than one. No 
change m PktW^ for the ingress nodes (I) with PktWt^ (t) = L Thus, the total sum of ti.e 
30 transmission rates at the ingress nodes (D tiiat share the congested core node (C) wxU be 
, ORi(t') < R««s for some f > t^ . This condition prevails until die average queue size 
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decays back to a value below minth, such that the outgoing packets are marked 
congestion-free (Le., LCti bit assumes the value 0). The congested core node (C) is 
therefore pushed back to a congestion-free state, and this contradicts the assumption. 

The above proposition also demonstrates the robustness of the DCM scheme in 
controlling congestion. That is, the DCM scheme continues to operate properly even 
under the presence of non-adaptive, greedy UDP traffic sources, since the method relies 
on the network elements (edge nodes) to adjust traffic into the domain rather than rest 
responsibility on the end applications. In the current IP network, the adaptive flows have 
to compete with non-adaptive flows and therefore are at a disadvantage when some of the 
non-adaptive sources are aggressive in grabbing bandwidth. In the proposed scheme, 
such a drawback is completely eliminated, and the edge routers can appropriately take 
preventive measures based on per-flow fair share. 

Simulation results 

The following are some results of the proposed method, A Network Simulator (disclosed 
in the website www-mash.cs.berkley.edu/ns/, hereby incorporated by reference) was 
employed for the simulations. Simulations were carried out based on the configuration 
shown in Fig. 9. The link between core routers CI and C2 is associated with the 
improvRED queue that facilitates in setthig the LCN bit of out going packets whenever 
the average queue size exceeds FeedbackThreshold. 

The size of minth and maxth of the improvRED are set to 30 and 200 packets 
respectively. The FeedbackThreshold is set to a value of 50. The buffer capacity of 
improvRED is set to 250 packets. The RTT between correspondmg ingress/egress nodes 
nodes (i.e., (II, El), (12, E2)) is assumed to be normally distributed, with an average of 
30ms with a variation of ± 5ms. 

The aggregate traffic sources at both ingress nodes II, 12 are modeled as Pareto on/off 
sources with an average burst time of 500ms and burst rate is around 0.8Mbps during on 
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periods. Theshapeparameter(oc)oftheParetodistributionissettol.2. Thechoiceof 
the Pateto distribution can be justified by its inherent properties such as long bursts and 
strong correlation between inter-airival times of the packets. Packet sizes are fixed at 
0.5Kb. 

Each of the queues at the ingress TBFs is provided with a buffer of 350 packets. The 
DemandThresh at both ingress TBFs is set at 30 packets. Based on the topology for j=1.2. 
Ri is set at 0.5Mbps and BJ^ is set at 50Kb. with PktWt* initially set to 1. The minPktW^ 
is estimated to be0.5.TheNnR is calculated frorn to 0.32 Mbps. thereby the rnaxPktW.? 

10 takes a value of 2.5. The m Figure 9, representing a number of level of PktWt 
adjustment and deterrniningtiie total number of discrete states (2A^+l)withm^^^^ 

mapping, is set to 5. The utilization is defined as foUows: 

Utilization = of average m rnmintr traffic rates at ingress TBFs 
Botdeneck link bandwidtti (14) 
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Several runs of simulations were carried out with a typical duration of 6000 sees. This is 
of several orders magnitude greater compared to RTFs between any given pair of edge 
Ongress/egress) nodes a, E). Therefore, tiie dynamics of tiie proposed scheme may be 
weU captured during tiiis 6000 sees duration. Simulations with larger time yielded 
similar trends as indicated in the rest of tiiis paper. 



The table m Fig. 10 depicts the performance comparison of DCM method against the 
non-feedback-based congestion control with RED at core nodes, averaged over several 
25 simulation runs. The figures in the table in Fig. 10 demonstrate that there is a significant 
. reduction in packet losses witii DCM metiiod over the non-DCM metiiod, especially at 
higher utilization factors (above 0.6). 

FurOiermore. tiie DCM metiiod is able to take advantage of tiie early congestion 
30 notification and regulate the inflow of traffic at die ingress TBFs into core domain 
^suiting in a reduction of packet loss at core nodes (C). As tiie utilization factor is 
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increased, the packet loss in the domain also increased. The total packet loss has two 
components, viz., i) packet loss at core nodes (Q and ii) packet loss at ingress TBF 
queues. Hence, as the utilization factor is increased, the demand at the ingress TBFs also 
increases correspondingly. This, in turn, results in a decrease in PktWt* s leading to 
higher injection of traffic into the core domain. This results in a slightly higher core 
packet loss as the utilization factor is increased. However, the overall packet loss at the 
core nodes (C) with DCM method is still limited to just (absolute) 3% up to a utilization 
factor of 1 . This is a small penalty incurred due to the bandwidth hunting by the ingress 
nodes (I) with minimal support from the core nodes (C) (restricted to setting LCN bit 
during congested periods). 

The total system packet loss (core+TBF) is consistentiy less than the corresponding 
packet loss in non-DCM method leading to a relative improvement by at least 30% in 
packet loss. Even under heavy-load regions (utilization > 1), the packet loss is 
substantially lower in the case of DCM methods over the non-DCM method by at least 
25%, thereby demonstrating the robustness of the DCM method to withstand occasional 
heavy loads. 

Next, the penalty incurred due to the traffic regulation at the ingress TBFs is evaluated. 
There are two types of penalties at ingress TBFs, viz., a) packet loss due to buffer 
overflow at ingress TBFs, and b) increase in average delay. There is a penalty due to 
buffer overflow at an ingress TBF that increases with input traffic as shown in column 4 
of the table in Fig. 10. The DCM method is able to keep the packet loss at the core nodes 
almost under 3% and push packet loss due to the excess traffic that cannot be 
accommodated at the core nodes, back to the ingress TBFs. Thus, the input aggregated 
flows that aggressively sent traffic into the domain will be penalized more than the others 
with substantial packet loss at the ingress TBFs. This leads to fairness of the DCM 
method in congestion management. As a result, overall better utilization of core 
bandwidth is achieved, as is depicted in the last column of the table in Fig. 10 wherein an 
overall relative improvement in packet loss of 30% obtained using the DCM method over 
non-DCM method. 
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The Other component of the penalty incurred at the ingress TBFs is the ingress queuing 
delay. Its statistics are given in the table in Fig. 11 for the DCM method. Theaverage 
queuing delay incurred at ingress TBFs increases with an increase in utilization factor. 
5 During congestion periods, the outgoing rate faUs back closer to MTR at each ifigress 
nodes leading to an increase in average queuing delay at the ingress nodes ® . Thus, the 
gain in overall improvement in packet loss is at the expense of slight increase in the 
queuing delay incurred at ingress TBFs due to traffic regulation. 

10 The DCM method effectively limits the packet loss at core domain to below 3%. In 
addition, the extra load that camiot be accommodated at botUeneck core node is either 
delayed or dropped at the ingress TBFs depending on buffer occupancy. This 
demonstrates that the DCM method can tolerate occasional high demands (utiUzation > 
1) on core bandwidth, and persistent demand for core bandwidth can lead to greater 

15 penalties at the ingress nodes (1). such as packet loss and increased average delay that is 
localized at the ingress TBFs. 

Figures 12(a) and 12(b) depicts a sample average queue size and the distribution of 
PktWt at an ingress TBF for a utilization factor of 0.8. Even though the demand at times 
is high, an ingress node is able to hunt tiie available bandwidth at the cote in order to 
clear tiie backlog at the TBF queue. This is further confirmed by the distribution of 
PktWt tixat shows tiiat tiie TBF system has been mostiy operating witii PktWt? less than 1. 
Thus, tiie system is able to detect the available bandwidth and clear die local backlog 
witii proper traffic regulation tiiat results in less tiian 3% packet loss at core domain. 
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The distribution of congestion periods (packet dropping phase) for a non-DCM method at 
the core node (Q is depicted in Figures 13(a) and 13(b). Some of the congestion periods 
lasted for more tiian few tens of seconds, thereby incurring potential packet loss for 
greater periods. In contrast, most of the congestion periods are confined witiiin 
miUiseconds time scale and rarely to a small fraction of second in the DCM method as 
depicted in Figures 14(a) and 14(b). The mean duration of congestion periods of non- 
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DCM method is of 2.3152 seconds at a utilization factor of 0.8 and an average duration of 
2.7179 seconds at a utilization of 0.9. In comparison, the mean duration of congestion 
periods for the DCM method are 0. 1849 and 0.21 10 seconds for the utilization factors of 
0.8 and 0.9 respectively. Thus the DCM method is able to reduce the duration of 

5 congestion periods at least by a factor of 10. This leads to potentially less packet loss at 
the core nodes (C) as akeady elucidated in the table in Fig. 10. The only overhead 
incurred is the LCN messages across the ingress (I) and egress nodes (E) that constitutes 
only a small fraction of bandwidth. Thus, the DCM method is quite robust in confining 
potential congestion periods to small durations, thereby improvmg overall utilization of 

10 core bandwidth. 

As noted infra, the RTTs that correspond to various labels 0*s or traffic aggregates) are of 
same order. However, in order to assess the impact of the domain-RTT, denoted by the 
largest RTT^ of a given Diff-serv domain, the domain-RTT is varied. The smaller RTTs 

15 of other traffic aggregates are taken to be within 5% of domain-RTT. Figures 15(a) and 
15(b) shows the impact of domain-RTT on the packet loss performance of the DCM 
method. The packet loss percentage at core nodes (C) is approximately bounded by 3% 
for the entire practical range of domain-RTT (30-lOOms) as depicted in figure 15(a). 
This indicates that a variation in domain-RTT has minimal implication to the DCM 

20 operation in the domain. 

However, the total system packet loss (core+TBF) increases slightiy with domain-RTT as 
depicted in Figure 15(b). This increase is mainly due to the mcrease in packet loss at the 
ingress TBF queue: During the congestion-free phase when die domain-RTT is large, it 
25 takes a longer duration for the TBF to mcrease its sending rate (or equivalendy decrease 
the PktWt^). Thus, it cannot keep up with high incoming traffic at the ingress queue. 
This results in more packet loss at the ingress TBF than in the case of lower domain- 
RTT. 

30 In the case of shorter domain-RTTs, it takes less time for ingress nodes to detect 

congestion-free periods via feedback messages. Therefore, it can quickly adjust^crease 



26 



10 



PCTAJSOl/16527 

WO 02/07381 

the traffic injection rates to cope with upcoming traffic. Thus, one way to cope with long 
domain-RTTs is to suitably configure a larger Diff-serv domain into smaller sut-domaurs 
such that the domain-RTTs are made smaUer. 

To summarize, during potentially strong congestion periods the packet loss is effectively 
reduced in the DCM method through traffic regulation at ingress TBFs. Thus, wasting of 
network resources by undeliverable packets during congestion periods is prevented. As a 
result, the DCM method enhances performance with respect to congestion 
detection/management under higher loads of network traffic. 

In summary, the DCM feedback control method both enhances resource utilization and 
controls/manages potential congestion states that occur in aDiff-serv domain. TheRED 
algorithm is improved by introducing two phases in congestion detection/management, 
marking and dropping. 

m the marking phase, packets are deterministically marked in the queue whenever the 
nunth threshold is exceeded by the average queue size. In the dropping phase, packets 
are probabUistically dropped whenever the average queue size exceeds the 
FeedbackThreshold value. This helps in early local congestion notifications that are sent 
20 by egress nodes (E) to corresponding ingress nodes (1). 

Furthermore, a TBF-based adaptive traffic management method is presented that 
responds to LCN messages sent from egress nodes (E). THe DCM method involves an 
exchange of information between edge nodes based on traffic aggregates between the 
25 corresponding ingress/egress nodes (I, E), and not on per-flow basis. Thisisaone 

advantage of the DCM method. Another advantage is quick detection/reaction to local 
congestion (-free) notification (LX:N) messages from egress routers. Hie DCM method .s 
based on feedback loop control mechanism between ingress (1) and egress nodes (E). 
Therefore, the edge routers handle tiie complexity of traffic regulation. 

30 
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Simulation results indicate that the method is quite effective in improving the packet loss 
ratio under high utilization of network bandwidth. Moreover, the DCM method is simple 
and local to Diff-serv domain and can be easily implemented (due to single network 
administration policy of a Diff-serv domain). 

5 

While the invention has been disclosed in this patent application by reference to the 
details of preferred embodiments of the invention, it is to be understood that the 
disclosure is intended in an illustrative rather than in a limiting sense, as it is 
contemplated that modification will readily occur to those skilled in the art, within the 
1 0 spirit of the invention and the scope of the appended claims and their equivalents. 
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What is claimed is : 

CLAIMS 

1) A method of detecting congestion, comprising: 

setting at least one threshold value for buffer occupancy; and 

dropping aU packets when an average queue size exceeds said threshold. 

2) The method accordmg to claim 1, wherein said threshold comprises a minunum and a 
niaximum threshold. 

3) The method according to claim 2. further comprismg the steps of: 

marking packets when said average queue size is between said minimum threshold and 
said maximum threshold; and 

dropping all packets when an average queue size equals or exceeds said maximum 
threshold. 

4) The method according to claim 1. wherein said threshold comprises a minimum 

threshold, amaximum threshold andafeedback threshold. 

5) Ihe method accordmg to claun 4, further coinprising the steps of: 

maridng outgoing packets when said average queue size is between said minimmn 
threshold and said feedback threshold; 

dropping incoming packets and marking said outgoing packets when said average 
queue size is between said feedback threshold and said maximum threshold; and 

dropping aU of said incoming packets when an average queue size equals or exceeds 
said maximum threshold. 

6) The method according to claim 5. wherein said avemge queue size is a size of a 
bucket of a token bucket filter. 



29 



wo 02/07381 



PCT/USOl/16527 



7) The method according to claim 5, wherein said packets are marked deterrainistically 
in said step of marking outgoing packets when said average queue size is between said 
minimum threshold and said feedback threshold. 

8) The method according to claim 5, wherein said packets are dropped probabilistically 
in said step of dropping incoming packets when said average queue size is between said 
feedback threshold and said maximum threshold. 

9) The method according to claim 5, wherein said step of marking outgoing packets 
when said average queue size is between said minimum threshold and said feedback 
threshold comprises setting at least one bit in at least one of said packets, and said step of 
marking said outgoing packets when said average queue size is between said feedback 
threshold and said maximum threshold comprises setting at least one bit in at least one of 
said packets. 

10) The method according to claim 5, further comprising the step of calculating s^d 
average queue size based on a moving average. 

1 1) The method of according to claim 5, further comprising the steps of: 
varying a number of tokens consumed by a data packet; and 

transmitting said packet if the number of tokens consumed by said packet is less than or 
equal to available tokens. 

12) The method according to claim 9, wherein said bit is part of a type of services byte. 

13) The method according to claim 9, wherein a core node performs said step of setting 
said bit. 

14) The method according to claim 9, wherein said bit is an LCN bit 
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15) The method according to claim 10. wherein said moving average is an exponential 
moving average. 

16) The method according to claim 1 1, wherein said step of varying the number of 
tokens consumed by a data packet comprises varying the number of tokens consumed by 
data of unit size. 

17) The method according to claim 1 1, wherein said step of varymg said number of 
tokens comprises the step of decreasing data of miit size monotonically if demand 
mcreases monotonically during a congestion free period. 

18) The method according to claim 1 1, wherein said step of varying said number of 
tokens comprises the step of increasing data of unit size upon receipt of a message. 

19) The method accordmg to clarni 12. wherein said type of services byte is a 
differentiated services code point byte. 

20) The method according to claun 15, wherem said packets are marked 
deterministically in said step of marking outgomg packets when said average queue, size 
is between said minimum tiireshold and said feedback threshold, wherein said packets are 
dropped probabilistically m said step of dropping incoming packets when said average 
queue size is between said feedback threshold and said maxhnum threshold, and wherein 
said step of marking outgoing packets when said average queue size is between said 
minimum tiireshold and said feedback threshold comprises a core node setting at least 
one bit in at least one of said packets and said step of marking outgoing packets when 
said average queue size is between said feedback tiu:eshold and said maximum threshold 
comprises setting at least one bit in at least one of said packets, wherein said bit is part of 
a type of services byte. 
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21) The method according to claim 18, wherein said step of varying said number of 
tokens comprises further decreasing said data of unit size if an average queue size is 
greater than a demand threshold. 

22) A method of regulating traffic flow between nodes, comprising: 
detecting congestion; 

sending a message to at least one node; 

regulating at least one traffic rate of said at least one node; and 

detecting when congestion is clear. 

23) The method according to claim 22, further comprising the step of incrementing said 
at least one traffic rate at random times when said congestion is clear. 

24) The method according to claim 22, wherein a core node detects said congestion, and 
an output node sends said message to at least one input node. 

25) The method of according to claim 22, wherein said step of regulating at least one 
traffic rate of said at least one node, comprises: 

reducing said at least one traffic rate of said at least one node proportional to the amount 
of traffic that said at least one node is injecting when said congestion is detected. 

26) The method of according to claim 22, wherein said step, of regulating at least one 
traffic rate of said at least one node, comprises: 

varying a number of tokens consumed by a data packet; and 

transmitting said packet if a number of tokens consumed by said packet is less than or 
equal to available tokens. 

27) The method according to claun 22, wherein said step of sending a message to at 
least one node comprises marking at least one packet 
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28) The method of according to claim 22. wherein said step of regulating at least one 
traffic rate of said at least one node, comprises steps of: 

varying a number of tokens consumed by a data packet by varying the number of tokens 
consumed by data of unit size, comprising the steps of : 

decreasing said data of unit size monotonically if demand increases moaotonically 

during a congestion free period; 

further decreasing said data of unit size if an average queue size is greater than a 
demiand threshold; and 

increasing said data of unit size upon receipt of a message; and 
transmitting said packet if a number of tokens consumed by said packet is less than or 
equal to avaUable tokens, wherein said at least one packet is marked deterministically. 

29) The method according to claim 26, wherein said step of varying the number of 
tokens consumed by a data packet comprises varying the number of tokens consumed by 
data of unit size. 

30) The method according to claim 26, wherein said step of varying said number of 
tokens comprises decreasing data of unit size monotonically if demand increases 
monotonically during a congestion free period. 

31) Hie method accordmg to claim 26, wherein said step of varying said number of 
tokens-comprises increasing data of unit size upon receipt of a message. 

32) The method according to claim 27. wherein said packet is marked deteraunistically. 

33) The method according to claim 27, wherein said step of marking packets comprises 
setting at least one bit in at least one of said packets. 

34) The metiiod according to claim 30. wherein said step of varying said number of 
tokens comprises further decreasing said data of unit size if an average queue size is 
greater than a demand threshold. ' 
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35) The method according to claim 33, wherein said bit is part of a type of services byte. 

36) A method of controlling traffic flow in a differentiated services domain, comprising: 
varying a number of tokens consurayed by a data packet; and 

transmitting said data packet if a number of tokens consumed by said data packet is less 
than or equal to available tokens. 

37) The method according to claim 36, wherein said step of varying the number of 
tokens consumed by a data packet comprises varying the number of tokens consumed by 
data of unit size. 

38) The method according to claim 36, wherein said number of tokens consumed by a 
data packet is varied based on state and demand. 

39) The method according to claim 36, further comprising steps of: 

a first step of determinmg available bandwidth by calculatmg an average queue size at a 
token bucket filter, and 

a second step of determinmg if said average queue size is greater than a demand 
threshold. 

40) The method according to claim 36, wherem said step of varying said number of 
tokens consumed by a data packet comprises a step of decreasing data of unit size 
monotonically if demand increases monotonically during a congestion free period 

41) The method according to claim 36, wherein said step of varying said number of 
tokens comprises a step of increasmg data of unit size upon receipt of a message. 

42) The method according to claim 37, wherein said step of varying the number of 
tokens consumed by data of unit size comprises varying the number of tokens consumed 
by a byte of data. 
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43) The method according to claim 37, wherein said step of varying the number of 
tokens consumed by data of unit size, comprises varying the number of tokens consumed 
by a bit of data. 

44) The method according to claim 37, wherein a minimum number of tokens consumed 
by said data of unit size equalsadepthofatoken bucket filter divided byamin-max fair 

share of the buffer. 

45) The method according to claim 37, wherein a minimum number of tokens consumed 
by said data of unit size equals a depth of a token generation rate divided by a min-max 
fair share of the bandwidth. 

46) The method according to claim 37, wherein a maximum number of tokens 
consumedby said data of unit size equalsadepthofatoken bucket filter divided bya 

maximum transmission unit. 

47) The method according to claun 37. wherem a maximmn number of tokens 
consumed by said data of unit size equals a token generation rate divided by a minimum 
data buffer drain rate. 

48) The method of according to claim 37. wherein said step of varying the number of 
tokens consumed by data of unit size, comprises the steps of : 

decreasing said data of unit size monotonically if demand increases monotonically 
during a congestion free period; 

further decreasing said data of unit size if an average queue size is greater than a 
demand threshold; and 

increasing said data of unit size upon receipt of a message, wherein said number of 
tokens consumed by a data packet is varied based on state and demand. 

49) TTie method according to claim 38, wherein said state is a congestion state. 
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50) The method according to claim 40, wherein said step of varying said number of 
tokens comprises further decreasing said data of unit size if an average queue size is 
greater than a demand threshold 

51) An apparatus for controlling traffic flow, comprising: 
a plurality of nodes; and 

at least one token bucket filter corresponding to at least one of said nodes. 

52) The apparatus according to claim 51, wherein said token bucket filter comprises: 
a token generator; and 

a bucket operably connected to said token generator to hold at least one generated token. 

53) The apparatus according to claim 52, wherein said nodes comprises: 
at least one input node; 

at least one output node; and 
at least one core node. 

54) The apparatus according to claim 53, wherein said at least one core node does not 
TTiai'nrain per-flow State information and carry a large number of aggregated flows; and 
wherein said ou^ut node and input node maintain per-flow state. 

55) The jq)paratus according to claim 53, wherein said nodes are part of a domain. 

56) The apparatus according to claim 53, wherein said nodes are part of an Internet. 

57) The apparatus according to claim 55, wherein said domain is a differentiated 
services domain. 
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COMPONENTS OF ATOKEN BUCKET FILTER 

FIG. 5B 
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Initialize: PktWtJ-* — 1.0 

Pkttit^ is always within [niinPAtf/t-^,raaxPA:tf^t^l 

MD is a monotonously decreasing function that takes a value (0,1) 

MI is a monotonously increasing function that takes a positive value 

j denotes the label corresponding to fixed route between a given pair 

of ingress/egress nodes 

for every ith round trip time (between ingress and egress nodes) 

During congestion- free periods 

if (average TBF queue size at ingress node DemandThrsh^) 

PJctWt^ PitWt^.j * MDfPJctWtj'.^ ) -.^230 

/* decrease the PJctWt^ during congestion free periods, based on demand 
at TBF */ 

else { 

if (PictWt^.j > 1) P;ctWt^^-« — niax[l,PW].j* MDfPJctWt^.p] 

if {Pktfltl^ <l) Pktw4^ — minll.Pkmi,^* MKPktWtl^)]] 
I* restore Pktm^ close to 1.0 */ 

At congestion notification time 

Pkttft^ -4 : + 1 if Pktf/t^.j< 1. 

^ {1 - minPktWt') 

I* The smaller the PktVt^ just before LCN, the bigger it will be during 
congestion period. A uniform mapping of [minPktWt^, 1) on to 
(1, maxPktVth intervals *I^^2SQ 

During congestion period 

Pktml^ — Pfctfft/.j * MUPktWtl^j^) it PktWtij^ ^ i-,^240 

On receipt of congestion clearance notification 

Select a random time less than RTT and, 

PktWti -« — PktWtlj^ * MD (PktWtlj^ ) -^220 



THETBF-BASED CONGESTION MANAGEMENT ALGORITHM AT INGRESS NODES 

FIG. 6B 
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